Traffic analysis apparatus and analysis method

ABSTRACT

A traffic analysis apparatus includes: a packet transmitter/receiver; a packet aggregating unit, for adding the number of packets that employ the same values for items in a combination that includes one arbitrary item or multiple items in packets obtained by the packet transmitter/receiver; a variety aggregating unit, for adding the number of appearances of different values in the items that are not included in the combination; and a packet estimation unit for, when the total number of packets is greater than a designated threshold value, employing a relationship between the values of the items of the combination formed of one arbitrary item or multiple items, the number of appearances of different values and the threshold value, and estimating the characteristics of the packets for which the number has exceeded the threshold value.

INCORPORATION BY REFERENCE

The present application claims priority from Japanese applicationJP2006-321020 filed on Nov. 29, 2006, the content of which is herebyincorporated by reference into this application.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a traffic analysis apparatus and atraffic analysis method for analyzing the characteristic of traffic on anetwork. The present invention relates in particular to a trafficanalysis apparatus and a traffic method for efficiently detecting, in alarge volume of traffic, that traffic which requires and employsextraordinarily broad bands, and for detecting and indicating thecharacteristic of that traffic.

2. Description of the Related Art

As the use of the Internet and LANs has grown, becoming ever morepopular, the stable operation of these networks has likewisedramatically increased in importance. Thus, especially since a huge,though actually unspecified, number of users may, and do, download andemploy a great variety of applications that are available on theInternet, and because, therefore, the probability is high either thatthe volume of regular traffic will increase and eventually exceed thatwhich has been estimated, by Internet service providers, for example, orthat there will be a drastic increase in malicious software traffic forthe distribution of malware such as worms and viruses, how to detect andhow to ascertain the characteristics of such varied traffic has become aproblem for which a solution is urgently required.

As means for resolving this problem, a technique by which to specify,for subsequent characterization extraction, excessive and maligntransmissions included in a large volume of traffic flowing via alarge-scale network, such as the Internet backbone, is disclosed inJP-A-2005-285048. According to this technique, frequent traffic, i.e.,traffic that probably is excessive or malign, is extracted from a largevolume of traffic data using a basket analysis method, which facilitatesthe analyzation of a large amount of data and the extraction, from thedata, of combinations of items for which the inclusion frequency ishigh. This technique also includes a feature that permits an analysis tobe performed by referring only to the header data portions required fortraffic data transmitted via a network.

Further, as a traffic analysis method, “number of varieties”, which, asapplied, is the determination and use of the number of destination hostsemployed by a specific host for communication, has drawn attention sincethe method can be employed to provide a parameter that is characteristicof a specific type of traffic. When cardinality is employed, an attackthat is hard to identify when using only simple information, such as thequantity of communication data, or malign traffic, for which the purposeis network scanning, can be identified comparatively accurately.Cardinality information can also be obtained by referring only to theheader information portion of traffic data that is required fortransmission via a network. Generally, in order to obtain a count forcardinality, all values that appear (e.g., the addresses of oppositecommunication parties when for cardinality the number of such partiesare to be counted) must be stored, and for this, a large memory capacityis required. As one method for providing a solution to this problem, atechnique is disclosed in NetHost: Aggregation of Traffic SummaryPer-Host, 2006 IEICE General Conference, BS-5-2. According to thistechnique, instead of directly storing a target value, a hash value iscalculated and a data entry is recorded, indicating that the targetvalue appeared in a bit on a bitmap that corresponds to the hash value.In this manner, the required memory size can be reduced, and the hashvalue can be used for the cardinality count.

According to the conventional art in the JP-A-2005-285048, since a datamining technique is employed for the extraction of excessive or maligntraffic, the rapid processing of a large amount of traffic is enabled,without imposing any limitations on a target being monitored and byemploying only the header information for packets. However, sinceinformation that is useful for cardinality calculations, for identifyingtraffic characteristics, is not collected, it is not possible todetermine the source applications for the frequent traffic data thatwere extracted, nor is it possible to determine what types of maligntraffic were intercepted.

Further, for the technique described in the JP-A-2005-285048, thetechnique described in NetHost: Aggregation of Traffic Summary Per-Host,for example, may also be employed as means for collecting additionalanalysis information. However, the technique described in theJP-A-2005-285048 is a method whereby, without physically limitingmonitoring target traffic, data mining is performed, while informationrelated to multiple traffic types is stored at the same time. Thus, whenthis technique and the one in NetHost: Aggregation of Traffic SummaryPer-Host are employed together, a cardinality counting memory must beprepared for each of multiple traffic types that are currently beinganalyzed. As a result, in total, a very large memory capacity isrequired.

SUMMARY OF THE INVENTION

One objective of the present invention is to provide a traffic analysisapparatus and a traffic analysis method for detecting and extractingmalign traffic on a network, such as the Internet backbone network, viawhich there is an enormous flow of traffic, and for preparingestimations for the characteristics of all malign traffic detected.

Another objective of the present invention is to provide a trafficanalysis apparatus and a traffic analysis method that require only asmall memory resource, and that enable the extraction of traffic deemedmalign and the preparation of estimations for the characteristics of themalign traffic, without imposing any limitations on a target beingmonitored.

To achieve the objectives, according to the present invention, a trafficanalysis apparatus comprises:

an accumulation unit, for aggregating the number of packets for eacharbitrary combination of items that are included in a packet headerportion that is transmitted;

a unit for aggregating the number of times different values appear thatare indicated in items that are not included in the arbitrarycombination; and

a unit for determining whether a packet count obtained by theaccumulation unit is greater than a predetermined threshold value,

wherein, when the packet count exceeds the threshold value, the type ofpacket that is transmitted is determined based on the association amongthe arbitrary combination of items, the threshold value and the totalappearance count aggregated for the different values.

Further, to achieve the above objectives, according to the invention,for the aggregation of the appearance count for different values of anitem that is not included in an arbitrary combination of items includedin the header portions of packets transmitted, as a unit that stores avalue that has already appeared, an arrangement is employed wherein, ata step of adding up the number of packets concerning a new combination,which is obtained by including, in addition, an item that is notincluded in an arbitrary combination, the appearance of the differentvalue is counted when the new combination first appears.

According to the invention, for the extraction of an improper packet andan estimation prepared for the characteristic of the packet, packetpattern matching, which takes processing time, is not required, andsimply a statistic process related to header information of a packetneed be performed. Therefore, the invention can also applied be for afast network along which traffic is heavy.

Further, the number of appearances of different values related to aspecific item included in the header of a packet can be added up withouta special storage area being prepared for the storage of values thatappeared in the past. Therefore, even for a fast network along whichtraffic is heavy, only a small number of memory resources is required toperform, using the number of varieties, analyses of the trafficcharacteristics.

Other objects, features and advantages of the invention will becomeapparent from the following description of the embodiments of theinvention taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating an example configuration for a trafficanalysis apparatus according to a first embodiment of the presentinvention;

FIG. 2 is a diagram illustrating example contents of a packet counttable according to the first embodiment;

FIG. 3 is a diagram illustrating an example structure for bits of anitem set value according to the first embodiment;

FIG. 4 is a diagram illustrating example contents for an extractiontarget table according to the first embodiment;

FIG. 5 is a diagram illustrating example contents for an auxiliary tableaccording to the first embodiment;

FIG. 6 is a diagram illustrating example contents for an extraction hosttable according to the first embodiment;

FIG. 7 is a diagram illustrating example contents for a P2P extractiontable according to the first embodiment;

FIG. 8 is a diagram illustrating an example estimation threshold valuetable according to the first embodiment;

FIG. 9 is a flowchart showing the overview processing performed by thetraffic analysis apparatus of the first embodiment;

FIG. 10 is a flowchart showing a packet count table updating process forthe first embodiment;

FIG. 11 is a flowchart showing a variety counting process;

FIG. 12 is a flowchart showing a host information extraction process;

FIG. 13 is a diagram for explaining the outline of a P2P file exchangingprocess according to the first embodiment;

FIG. 14 is a diagram illustrating an example initial setup informationinput screen according to the first embodiment;

FIG. 15 is a diagram illustrating an example host information outputscreen according to the first embodiment;

FIG. 16 is a diagram illustrating an example configuration for a trafficanalysis apparatus according to a second embodiment of the presentinvention;

FIG. 17 is a diagram illustrating example contents of a P2P extractiontable according to the second embodiment;

FIG. 18 is a diagram illustrating an example estimation threshold valuetable according to the second embodiment;

FIG. 19 is a flowchart illustrating the overview of the processingperformed by the traffic analysis apparatus of the second embodiment;

FIG. 20 is a diagram for explaining the outline of a P2P file exchangingprocess of the second embodiment;

FIG. 21 is a diagram illustrating an example initial setup informationinput screen according to the second embodiment; and

FIG. 22 is a diagram illustrating an example host information outputscreen according to the second embodiment.

DESCRIPTION OF THE EMBODIMENTS

The preferred embodiments of the present invention will now be describedin detail while referring to the accompanying drawings. The presentinvention is not limited to these embodiments.

[First Embodiment]

FIG. 1 is a diagram showing the configuration of a traffic analysisapparatus 101 according to a first embodiment of the present invention.In FIG. 1, the traffic analysis apparatus 101 is connected to a network102 and an input/output device 103. The network 102 is a network viawhich traffic to be analyzed flows, and the input/output device 103either issues instructions to the traffic analysis apparatus 101, ordisplays the analysis results.

The traffic analysis apparatus 101 includes: a packettransmitter/receiver 105, a memory 106, a packet aggregating unit 120, avariety aggregating unit 121, a packet estimation unit 122 and acontroller 123.

The packet transmitter/receiver 105 receives, via the network 102,traffic information to be analyzed. The memory 106 includes: a trafficinformation buffer 107, for temporarily storing the received trafficinformation; a packet count table 108, for storing a statistical valuerelated to the traffic information; an extraction target table 109, forstoring information that designates a type of flow to be extracted fromthe traffic information; an auxiliary counting table 110, fortemporarily storing information required for adding the numbers of thevarieties that are included in the statistical value related to thetraffic information; a host table 111, for storing the results obtainedby estimating, based on the flow type, a host operation related to theflow; a P2P extraction table 112, for storing information, which isrequired for estimating whether the host operation is a P2P fileexchange application, and the estimation results; and an estimatedthreshold value table 113, for storing threshold value information thatis required for estimating a host type.

The packet aggregating unit 120 aggregates the number of packets forwhich the same value is employed for the individual items of an itemgroup, which is a combination of items (e.g., a transmission source IPaddress, a transmission destination IP address, a transmission sourceport number and a destination port number) included in a packet that isexchanged via the network 102. The variety aggregating unit 121aggregates the number of times a different value has appeared in an itemthat is not included in the item group. The packet estimation unit 122employs the aggregation information for estimating the characteristicsof packets that are being exchanged via the network 102. The controller123, for controlling the processing of the traffic analysis apparatus101, controls the performance of all processing except that which isperformed by the packet aggregating unit 120, the variety aggregatingunit 121 and the packet estimation unit 122. The packet aggregating unit120, the variety aggregating unit 121, the packet estimation unit 122and the controller 123 may be provided using individual hardwarecomponents, or may be provided by a single hardware component, such as aCPU, that can perform these processes. Further, software products(programs) for performing the individual processing functions may beprepared and executed by the CPU.

With the above described configuration, the traffic analysis apparatus101 receives, via the network 102, traffic information that is analyzedand used to estimate whether the traffic is excessive or malign, anddisplays the estimation results on the input/output device 103.

Example structures of from the packet count table 108 to the estimationthreshold value table 113, which are included in the memory 106, will bedescribed while referring to FIGS. 2 to 8. Further, an example operationof the traffic analysis apparatus 101 will be described in detail byemploying the flowcharts in FIGS. 9 to 12.

FIG. 2 is a diagram showing an example for the packet count table 108.For this embodiment, the following analysis system is employed. Forindividual packets included in traffic information received via thenetwork 102, a specific combination (called an item set) of packetelements is focused on, and the number of packets which have the samevalue in the combination of the elements (a set of these packets iscalled a flow) are added up. As a result, a flow to be focused on isextracted from the traffic information, and the operation of a host thatis related to the flow is estimated.

The packet count table 108 is used to store the aggregation results, andthe result obtained by one aggregation is stored for one entry. Oneentry includes: an entry number 108 a, which uniquely identifies theentry; an item set value 108 b, which indicates an item set for a flowof packets to be aggregated for the pertinent entry; a transmissionsource IP address 108 c, which indicates either a transmission source IPaddress value included in the flow or the number of varieties; adestination IP address 108 d, which indicates either a destination IPaddress value included in the flow or the number of varieties; atransmission source port number 108 e, which indicates either atransmission source port number value included in the flow or the numberof varieties; a destination port number 108 f, which indicates either adestination port number value included in the flow or the number ofvarieties; a packet count 108 g, which indicates the sum of the packetsof which the flow consists; an aggregated byte count 108 h, whichindicates a value obtained by adding the lengths of the packets of whichthe flow consists; and a counting start time 108 i, which is the time atwhich the aggregation process was started for the entry.

The individual entries stored in the packet count table 108 are resultsobtained by analyzing the information for packets that are exchanged viathe network 102, and may also be regarded as analysis information thatincludes values entered in the items 108 b to 108 i. The packet counttable 108 can also be regarded as an analysis information storage unitin which those multiple entries (analysis information sets) are stored.

A bit pattern that indicates values to be stored in the item set value108 b is shown in FIG. 3. The value entered in the item set value 108 brepresents a condition as to whether the values entered in thetransmission source IP address 108 c, the destination IP address 108 d,the transmission source port number 108 e and the destination portnumber 108 f either are values used for the elements of the item set, orare variety count values, which indicate how many different types ofvalues appeared.

In the description for this embodiment, four element types, i.e., atransmission source IP address, a destination IP address, a transmissionsource port number and a destination port number, are employed as theitems that form an item set. However, the elements to be processed arenot limited to these, and the values of other items included in an IPheader, a TCP header or a UDP header, or part of the data that followthe TCP header or the UDP header may be employed in accordance with thepurpose of analysis.

Furthermore, the values of other items included in the IP header, theTCP header or the UDP header of an IP packet, which is stored in apacket for tunneling protocol, such as L2TP or PPP, or part of the datathat follow the TCP header or the UDP header, may be employed.

FIG. 4 is a diagram showing example contents of the extraction targettable 109. The extraction target table 109 is a table wherein the typeof flow to be used for a statistical process and the contents ofstatistical information to be obtained are defined based on trafficinformation that is received via the network 102. The type of flow andthe contents of statistical information to be obtained are defined forone entry. One entry includes: an entry number 109 a, which uniquelyidentifies the entry; an item set value 109 b, which indicates the typeof flow; a threshold value 109 c, which designates a timing for startingthe process for estimating a host operation that concerns the flow; anda variety count updating targeted item set 109 d that indicates an itemset, for which adding up the number of varieties is performed in theentry processing.

FIG. 5 is a diagram showing example contents of the auxiliary countingtable 110. The auxiliary counting table 110 is an auxiliary work tableemployed when the number of varieties is counted using the packet counttable 108. In the analysis process for one packet that is included intraffic information received via the network 102, the auxiliary countingtable 110 is employed to temporarily store, for each item set for use inthe statistical process, the entry number of an entry in the packetcount table 108 that is employed for adding up the number of packets andthat includes the item set. Specifically, the auxiliary counting table110 includes a plurality of entries, each of which consists of a field110 a, for storing the item set value of the item set; and a field 110b, storing the entry number.

FIG. 6 is a diagram showing example contents of the host table 111. Thehost table 111 includes a plurality of entries for storing information,about the operation of a host, that is related to a targeted flowextracted from traffic information received via the network 102. Oneentry includes: an entry number 111 a, which uniquely identifies theentry; an IP address 111 b, for the host; a host type 111 c, whichindicates whether the host is operating as a server or as a client; aservice port number 111 d, which the host employs to provide a serviceor to receive a service; a detected threshold value 111 e, whichindicates the packet count of the targeted flow at the time at which arecording was made for the entry; a sender count (IN) 111 f, whichindicates the number of hosts serving as transmission sources for thepackets included in the targeted flow that the host received; arecipient count (OUT) 111 g, which indicates the number of hosts servingas destinations for the packets included in the targeted flow that thehost transmitted; a measured period (IN) 111 h, which is the timerequired, from the start of the counting of the packets in the targetedflow the host received, for the count to reach the detected thresholdvalue 111 e; a measured period (OUT) 111 i, which is the time required,from the start of the counting of the packets in the target flow thatthe host transmitted, for the count to reach the detected thresholdvalue 111 e; an average band (IN) 111 j, for storing the average datavolume for each hour, from the start of the counting of packets in thetargeted flow received by the host, that continued until the countreached the detected threshold value 111 e; an average band (OUT) 111 k,for storing the average data volume, from the start of the counting ofthe packets in the targeted flow transmitted by the host, that continueduntil the count reached the detected threshold value 111 e; a DDoSattack/network scan flag 111 l, which indicates the results of anestimation made as to whether the targeted flow was a DDOS attack or anetwork scan; and a latest update time 111 m, which indicates the latesttime at which the contents of the entry were updated.

FIG. 7 is a diagram showing example contents of the P2P extraction table112. The P2P extraction table 112 includes a plurality of entries forstoring information required to estimate whether the operation of a hostrelated to a targeted flow, which is extracted from traffic informationreceived via the network 102, was a P2P file exchanging application, andthe results obtained by the estimation. One entry includes: an entrynumber 112 a, which uniquely identifies the entry; an IP address 112 bof the host; a P2P estimation flow detection count 112 c, whichindicates the number of detected P2P estimation flows that were employedfor an estimation as to whether the host operation was a P2P fileexchange application; a P2P estimation result 112 d, which indicatesresults obtained by an estimation as to whether the operation was a P2Pfile exchange application; and a variety count distribution parameter A112 e and a variety count distribution parameter B 112 f, which indicateparameters, obtained through calculations based on numerical valuesincluded in the statistical values for an extracted P2P estimation flow,that were employed for an estimation as to whether the host operationwas a P2P file exchange application. The definitions of the varietycount distribution parameter A 112 and the variety count distributionparameter B 112 f will be described later in detail in the explanationof the operation.

FIG. 8 is a diagram showing example contents for the estimationthreshold value table 113. When an estimation is to be made for theoperation of a host related to a flow that is included in trafficinformation received via the network, numerical values employed asreferences for the estimation of the type of the flow are entered in theestimation threshold value table 113. The estimation threshold valuetable 113 includes: a DDOS estimation threshold value 113a, which isused as a reference for a determination as to whether the flow type isDDOS attack; a network scan estimation threshold value 113 b, which isused as a reference for a determination as to whether the flow type is anetwork scan; a P2P estimation, variety count distribution parameter Athreshold value 113 c and a P2P estimation, variety count distributionparameter B threshold value 113 d, which are used as references for adetermination as to whether the flow type is a P2P file exchangeapplication.

Next, the operation of the traffic analysis apparatus 101 will bedescribed while referring to the flowchart in FIG. 9. The controller 123of the traffic analysis apparatus 101 performs the initializationprocess prior to the analysis process (step 901). Specifically, duringthe initialization process, the individual entries in the packet counttable 108, the host table 111, the P2P extraction table 112 are set inthe initial state wherein no data are registered.

Following this, the controller 123 receives initial setup information903 from the input/output device 103, and enters the initial setupinformation 903 in the extraction target table 109 and the estimationthreshold value table 113 (step 904).

It should be noted that to obtain the information required to form theinitial setup information 903, the input/output device 103 displays aninitial setup information input screen 902 and then waits for a userinput operation.

FIG. 14 is a diagram showing an example initial setup information inputscreen 902. The screen shown in FIG. 14 includes fields for a flowdetection threshold value 902 a, a DDOS estimation threshold value 902b, a network scan estimation threshold value 902 c, a P2P estimationparameter A 902 d and a P2P estimation parameter B 902 e. When anexecution button 902 f is clicked on, the values entered in these fieldsare transmitted, as the initial setup information 903, to the trafficanalysis apparatus 101, and the controller 123 writes the valuesincluded in this information 903 in the corresponding areas of theextraction target table 109 and the estimation threshold table 113.Specifically, a value input as the flow detection threshold value 902 ais entered as the threshold values 109 b for all the entries in theextraction target table 109; a value input as the DDoS estimationthreshold value 902 b is entered as the DDOS estimation threshold value113 a for the threshold value table 113; a value input as the networkscan estimation threshold value 902 c is entered as the network scanestimation threshold value 113 b for the threshold value table 113; avalue input as the P2P estimation parameter A 902 d is entered as theP2P estimation, variety count distribution parameter A threshold value113 c for the threshold value table 113; and a value input as the P2Pestimation parameter B 902 e is entered as the P2P estimation, varietycount parameter B threshold value 113 d for the threshold value table113.

When the process at step 904 has been completed, the traffic analysisapparatus 101 enters the waiting state for the reception of trafficinformation from the network 102. In this state, when the packettransmitter/receiver 105 of the traffic analysis apparatus 101 receivestraffic information 905 via the network 102, the traffic information 905is temporarily stored in the traffic information buffer 107. The trafficinformation 905, for example, is either a copy of a packet that isexchanged via the network 102 or a sFlow packet formed by summarizingportions of multiple packets that are sampled at appropriate intervals.

When the traffic information 905 has been stored in the trafficinformation buffer 107, the packet aggregation unit 120 begins theupdating of the packet count table 108 (step 906). At step 906, for theindividual packets that are included in the traffic information 905stored in the traffic information buffer 107, the statistical process isperformed using the packet count table 108. In addition, a flow to befocused on is extracted, the estimation process is performed for theoperation of a host related to the flow, and based on the obtainedresults, the host table 111 and the P2P extraction table 112 areupdated. The detailed process will be described later while referring tothe flowchart in FIG. 10.

When, as a result of the operation performed at step 906, the contentsof the host table 111 and the P2P extraction table 112 have been updated(step 907), the contents of the two tables are output to theinput/output device 103, i.e., the host information output process isperformed (step 908). During this process, the contents of the twotables are assembled as extracted information 909, and the extractedinformation 909 is transmitted to the input/output device 103 anddisplayed on a host information display screen 910.

FIG. 15 is a diagram showing an example host information display screen910. In this example, entries in the host table 111, the host types 111c that indicate “server”, are displayed in a server list 910 a; entries,the host types 111 c that indicate “client”, are displayed in a clientlist 910 b; and an entry in the P2P extraction table 112, for which avalue in the P2P estimation result 112 d is “1”, that indicates a hostoperation it is estimated is a P2P file exchange application, isdisplayed in the P2P file exchange host list 910 c. Specifically, in theserver list 910 a and the client list 910 b, the values in the IPaddress 111 b, the service port number 111 d, the average band (OUT) 111i and the average band (IN) 111 h, which are included in the entries forthe host table 111, are respectively displayed, on the screen, incolumns “IP address”, “port number”, “transmission band” and “receptionband”. For an entry, for which “1” is entered for the DDoSattack/network scan flag 111 l, a specific mark is displayed in column“DDOS” or “Scan”. In the P2P file exchange host list 910 c, values inthe IP addresses 112 b of the entries for the P2P extraction table 112are displayed in column “IP address”. Then, the host table 111 issearched for entries the IP addresses of which indicate a value in theIP address 112 b and the host types of which are “server”. Thereafter,values in the service port numbers 111 d of the entries that are foundare displayed in column “port number”.

The operation of the traffic analysis apparatus 101 has been described,and the processing at steps 906 through 908 is repeated each timetraffic information 905 is received via the network 102.

The processing at step 906 for updating the packet count table 108 willnow be described in detail while referring to the flowchart in FIG. 10.

At step 906, for packets included in the traffic information 905, thestatistical process is performed for each designated item set in theextraction target table 109, to permit the packet count table 108 toreflect the obtained results. For this process, first, the packetaggregation unit 120 prepares a variable i for sequentially scanning theentries in the extraction target table 109, and initializes all theentries as 1 (step 1001).

Then, the packet aggregation unit 120 obtains a value in the item setvalue 109 b, included in the i-th entry of the extraction target table109 (step 1002), selects a use entry in the packet count table 108 forstoring aggregation information for an item set to be aggregated, forpackets stored in the traffic information buffer 107, that correspondsto the item set value 109 b (step 1003). As a specific selection method,for example, the values of the individual elements of an item set to beaggregated are linked together, a hash function, such as MD5, is appliedfor the obtained value, the resultant value is divided by the maximumentry count of the packet count table 108, and “1” is added to theremainder. The obtained value is employed as a use entry number.

As another selection method, a plurality of use entry choices areselected using multiple different calculation methods, and when all theselected entries have been currently employed, of the use entries, theentry in which the minimum value is entered in the packet count 108 g isemployed as a use entry number. Using this method, information for aflow that frequently appears tends to remain, without multiple entrieshaving to be prepared in the packet count table 108.

Following this, the packet aggregation unit 120 compares the item set tobe aggregated with the item set stored in the use entry, and verifiesthe contents of the use entry (step 1004). When the use entry is in theunused state, or when the item set to be aggregated is different fromthe item set stored in the use entry, the processing at steps 1005 to1007 is performed.

The process performed at step 1005 is the initialization of the useentry. During this process, the item set value obtained at step 1002 isentered in the item set value 108 b of the use entry. And as for thetransmission source IP address 108 c, the destination IP address 108 d,the transmission source port number 108 e and the destination portnumber 108 f, element values included in the item set to be aggregatedare set for those that are designated, in the item set value, as itemset elements, while a value of “0” is set for those that are designatedas elements to be used for counting the number of varieties. Further, avalue of “0” is set for the packet count 108 g and the aggregated bytecount 108 h, and the current time is set as the count start time 108 i.

The process performed at step 1006 is the updating of the auxiliarycounting table 110. In the auxiliary counting table 110, the entrynumber of the use entry is entered in the field of the packet counttable entry number 110 b, in consonance with the field of the item setvalue 110 a, the value of which matches the item set value obtained atstep 1002.

The process at step 1007 is a process for counting the number ofvarieties. This process will be described later in detail whilereferring to the flowchart in FIG. 11.

At step 1004, when the item set to be aggregated is the same as the itemset stored in the use entry, it means that the use entry has alreadybeen employed for the statistical process for the item set to beaggregated, and therefore, the processing at steps 1005 to 1007 is notperformed.

Sequentially, the packet aggregation unit 120 updates the counterinformation included in the use entry (step 1008). Specifically, thepacket count 108 g is incremented by one, and the packet length is addedto the aggregated byte count 108 h.

When, as a result of the process performed at step 1008, the value ofthe packet count 108 g equals the value in the threshold value 109 c inthe i-th entry of the extraction target table 109 (step 1009), at step1010, the packet estimation unit 122 performs a host informationextraction process. The host information extraction process is a processfor estimating the operation of a host related to a flow that isidentified by the item set, the packet count of which has exceeded thethreshold value. The host table 111 and the P2P extraction table 112reflect the results of this process. The detailed process will bedescribed later while referring to the flowchart in FIG. 12.

Following this, the packet aggregation unit 120 increments the value ofthe variable i by one (step 1011), and repetitively performs theprocessing at steps 1002 to 1011 until the value of the variable iexceeds the total number of entries for the extraction target table 109(step 1012). The processing at step 906 is thereafter terminated.

The variety counting processing at step 1007 in FIG. 10 will now bedescribed in detail while referring to the flowchart in FIG. 11.

First, the governing principle for the counting of the number ofvarieties will be briefly described. Assume that the number of varietiesof transmission source IP addresses are to be counted for the first flowthat includes an item set consisting, for example, of a destination IPaddress and a destination port number. And assume that a new entry inthe packet count table 108 has been prepared for an item set thatemploys the same values as those in the first flow for a destination IPaddress and a destination port number, and includes a transmissionsource IP address as the third element. In this case, the number of thevarieties of transmission source IP addresses can be obtained byincrementing it one. Then, whether the second flow has appeared can beeasily determined by performing the process at step 1004 in theflowchart in FIG. 10. Step 1007 is performed only when a new second flowhas appeared, and corresponds to the portion that actually performs thecounting process.

At step 1007, the variety aggregation unit 121 prepares a variable jthat is used for sequentially scanning the elements in the variety countupdating targeted item set list 109 d for an entry, in the extractiontarget table 109, that was to be processed when the process at step 1007was initiated. And the variety aggregation unit 121 initializes thevariable j as “1” (step 1101).

Sequentially, from the variety count updating targeted item set list 109d for the entry, in the extraction target table 109, that was to beprocessed when the process at step 1007 was initiated, the varietyaggregation unit 121 extracts the j-th element, and regards this valueas “x” (step 1102). When “x” is not 0, the process following step 1104is continued, or when “x” is 0, the process at step 1007 is terminated.

At step 1104, the variety aggregation unit 121 searches the auxiliarycounting table 110 for an entry whose item set value 110 a is the sameas “x”, extracts, from the entry, the value in the packet count tableentry number 110 b, and regards this value as “y”. When the value of “y”is not 0, the process at step 1106 is performed, but when the value of“y” is 0, the process at step 1106 is skipped.

At step 1106, the variety aggregation unit 121 adds “1” to the number ofvarieties for the pertinent item for an entry for which “y” is presentin the entry number 108 a of the packet count table 108. This itemcorresponds to a bit for which the value differs by “x” from the itemset value 109 b of the entry, in the extraction target table 109, thatwas to be processed when step 1007 was initiated. This can be easilyobtained by calculating the exclusive local sum of the item set value109 b and “x”.

Following this, the variety aggregation unit 121 increments the variablej one, and returns to step 1102 and repeats the processing there (step1107). In this manner, the number of varieties is counted.

The host information extraction processing at step 1010 in FIG. 10 willnow be described in detail while referring to the flowchart in FIG. 12.

The packet estimation unit 122 determines the type of a beyond-thresholdflow based on the item set value 108 b for the entry, in the packetcount table 108, the packet count of which has exceeded the thresholdvalue as a result of the counter updating process performed at step 1008(step 1201). In this embodiment, a flow type, the item set value ofwhich, in hexadecimal, is 05 (the elements of an item set are adestination IP address and a destination port number) or 0a (theelements of an item set are a transmission source IP address and atransmission source port number), is defined as a server flow. The flowtype, the item set of which, in hexadecimal, is 06 (the elements of anitem set are a destination IP address and a transmission source portnumber) or 09 (the elements of an item set are a transmission source IPaddress and a destination port number), is defined as a client flow. Aflow type, the item set of which, in hexadecimal, is 08 (the element ofan item set is a transmission source IP address) is defined as a P2Pestimation flow. However, the flow types, in this case, are not limitedto these three, and another flow type may be defined for a differentcombination of elements.

When the determination is that the flow type is a server flow or aclient flow, the process at steps 1203 through 1206, for updating thehost table 111, is performed. But when the determination is that theflow type is a P2P estimation flow, the P2P file exchange hostestimation process at steps 1207 through 1211 is performed. In any othercase, the processing at step 1010 is terminated (step 1202).

To update the host table 111, first, the packet estimation unit 122performs and examination to determine whether information for a hostrelated to the beyond-threshold flow has already been registered in thehost table 111 (step 1203). When the host information has not yet beenregistered, this information is newly registered in an unused entry inthe host table 111 (step 1204). For the new registration process,information included in the entry, in the packet count table 108, thepacket count of which has exceeded the threshold value, is employed, andvalues are set in the individual fields of the IP address 111 b, thehost type 111 c, the service port number 111 d and the detectionthreshold value 111 e of the unused entry.

The information in the entry that is found at step 1203, or that isnewly registered at step 1204, is updated in accordance with theinformation included in the entry, in the packet count table 108, thepacket count of which has exceeded the threshold value (step 1205). Thefields to be undated are: the sender count (IN) 111 f, the recipientcount (OUT) 111 g, the measured period (IN) 111 h, the measured period(OUT) 111 i, the average band (IN) 111 j, the average band (OUT) 111 kand the latest update time 111 m.

Finally, the estimation process is performed to estimate whether thehost operation indicated in the entry updated at step 1205 is a DDOSattack or a network scan. The estimation result is entered in the DDOSattack/network scan flag. Thereafter, the host table 111 updatingprocess is terminated (step 1206).

The method of this embodiment used to estimate that a host operation isa DDOS attack or a network scan will now be described.

First, a DDOS attack is an activity such that multiple attacking hostsissue access requests to a port number used by a specific host toprovide a service. Packets transmitted by these attacking hosts aredetected as server flows, and since each of the IP addresses of theattacking hosts is different, it is assumed that the number of varietiesof transmission source IP addresses for the server flows is similar tothe detection threshold value for server flows. Therefore, an estimatedthreshold value, which is used to estimate whether or not a server flowthat is detected is a DDoS attack, is defined as a ratio of the numberof varieties to the detection threshold value. Thus, when the ratio ofthe number of varieties for the transmission source IP address of theserver flow relative to the detection threshold value is greater thanthe estimated threshold value that has been defined, it is estimatedthat the pertinent server flow is a DDOS attack. At step 1206, a valuestored in the DDOS estimation threshold value 113 a of the estimationthreshold value table 113 is employed as the estimation threshold valuethat is defined.

Similarly, a network scan is an activity such that, in order to searchfor a server, for which a specific host is providing a service using thesame port number, an access request is issued to multiple different IPaddresses by using the same destination port number. Packets transmittedby the host are detected as a client flow, and the number of varietiesfor the destination IP address of the client flow is regarded as beingsimilar to the detection threshold value for client flows. Therefore, anestimated threshold value, which is used to estimate whether a clientflow that is detected is a network scan, is defined as a ratio of thenumber of varieties for the destination IP address relative to thedetection threshold value. And when the ratio of the number of varietiesfor the destination IP address of the client flow relative to thedetection threshold value is greater than the estimated threshold valuethat has been defined, it is estimated that the pertinent client flow isa network scan. At step 1206, a value stored in the network scanestimation threshold value 113 b of the estimated threshold value table113 is employed as the estimated threshold value that is defined.

Next, the P2P file exchange host estimation processing, beginning atstep 1207, will be described.

In the P2P file exchange host estimation processing, first, the packetestimation unit 122 determines whether information concerning a hostrelated to the beyond-threshold flow has already been registered as aserver in the host table 111 (step 1207). This process is performedbased on the idea that when the host is a P2P file exchange host,accordingly, a server flow always appears, and therefore, as a necessaryrequirement, the host should already have been registered as a server inthe host table 111 in order to prepare an estimation for P2P fileexchange host. When it is not confirmed at step 1207 that the host hasalready been registered as a server in the host table 111, theprocessing at step 1010 is terminated without any further processesbeing performed.

When it is confirmed at step 1207 that the host has already beenregistered as a server in the host table 111, the packet estimation unit122 examines the P2P extraction table 112 to determine whetherinformation for a host related to the beyond-threshold flow has alreadyregistered (step 1208). When such information has not yet beenregistered, the information is newly registered in an unused entry ofthe P2P extraction table 112 (step 1209). The new registration processis a process during which, based on information included in thebeyond-threshold entry in the packet count table 108, a value is set inthe IP address 112 b of the unused entry, and a value of “0” is set inthe P2P estimation flow detection count 112 c and the P2P estimationresults 112d.

Sequentially, the information in the entry that is found at step 1208,or the information newly registered in the entry at step 1209, isupdated using the information in the beyond-threshold entry in thepacket count table 108 (1210). Specifically, the value in the P2Pestimation flow detection count 112 c is incremented by one, and thevariety count distribution parameter A 112 e and the variety countdistribution parameter B 112 f are calculated.

Prior to explaining the definitions for the variety count distributionparameter A 112 e and the variety count distribution parameter B 112 f,the P2P file exchange, host estimation method for this embodiment willbe described.

FIG. 13 is a schematic diagram illustrating P2P file exchange flows.While referring to FIG. 13, a host 1301 is currently performing a P2Pfile exchange, while n hosts 1302 serve as servers for the host 1301 andm hosts 1303 serve as clients for the host 1301, each of which is alsocurrently performing a P2P file exchange. As the P2P file exchangeprotocol assumed in this embodiment, a service port number used when ahost is operated as a server is determined at random for each host. Withthis arrangement, for a P2P estimation flow that is detected because itexceeds the threshold value and that employs the host 1301 as atransmission source IP address, the ratio of about (n+m):n:(n+m) isobtained as a ratio of three values, i.e., the number of varieties ofdestination IP addresses, the number of varieties of transmission sourceport numbers and the number of varieties of destination port numbers. Inthe case of n=m, for example, when multiple P2P estimation flows aredetected, so long as the ratio of 2:1:2 is obtained for all of theflows, this ratio can be used as one of the bases for estimating thatthe host 1301 is performing a P2P file exchange. However, actually,since n and m change as time elapses, it is assumed that the ratioincludes a fluctuation for each flow. While taking this point intoaccount, in this embodiment, the following estimation method isemployed. When a plurality of P2P estimation flows that include the sametransmission source IP address are received, a ratio (a first ratio) ofthe number of varieties of destination IP addresses to the number ofvarieties of destination port numbers, and a ratio (a second ratio) ofthe number of varieties of destination IP addresses to the number ofvarieties of transmission source port numbers are calculated. And whenthe first ratio distributed is near a value of “1” and the second ratiodistributed is near a value of 0.5, and when the host having thetransmission source IP address has already been registered as a serverin the host table 111, it is estimated that the flow is a P2P fileexchange flow.

Specifically, in this embodiment, a method for employing theleast-squares method to calculate the ratio and the degree of varianceis employed to perform the estimation. In order to confirm the firstratio, using the least-squares method, the detection results formultiple P2P estimation flows are approximated with linear functiony=ax+b, where x denotes the variety count of destination port numbersand y denotes the variety count of destination IP addresses, and thevalues of a and b and the value of a correlation coefficient c areobtained. Then, whether these values are included in a predeterminedrange is determined. In this manner, the first ratio is confirmed. Thecombination of a, b and c obtained through calculation is the varietycount distribution parameter A 112 e. Similarly, for the confirmation ofthe second ratio, using the least-squares method, the detection resultsfor multiple P2P estimation flows are approximated with linear functiony=ax+b, where x denotes the variety count of transmission source portnumbers and y denotes the variety count of destination IP addresses, andthe values of a and b and the value of a correlation coefficient c areobtained. Then, whether these values are included within a predeterminedrange is determined. In this manner, the second ratio is confirmed. Thecombination of the values a, b and c obtained through calculation is thevariety count distribution parameter B 112 f.

Finally, the packet estimation unit 122 determines whether the values,obtained at step 1210, of the variety count distribution parameter A 112e and the variety count distribution parameter B 112 f are respectivelyincluded in ranges designated in the P2P estimation variety countdistribution parameter A threshold value 113 c and the P2P estimationvariety count distribution parameter B threshold value 113 d of theestimation threshold value table 113. When the values are included inthe ranges, it is estimated that the host is a P2P file exchange host,and the value in the P2P estimation results 112 d for the pertinententry is changed to 1 (step 1211).

The operation of the traffic analysis apparatus 101 of the firstembodiment has been described.

[Second Embodiment]

FIG. 16 is a diagram showing the configuration of a traffic analysisapparatus 201 according to a second embodiment of the present invention.A difference in the traffic analysis apparatus 201 in FIG. 16 from thetraffic analysis apparatus 101 of the first embodiment is that a P2Pextraction table 212 (FIG. 17) and an estimation threshold value table213 (FIG. 18) are included in a memory 106 (the names of the tables arethe same, but the table contents are different). A network 102, aninput/output device 103, a packet transmitter/receiver 105, the memory106, a traffic information buffer 107, a packet count table 108, anextraction target table 109, an auxiliary counting table 110, a hosttable 111, a packet aggregating unit 120, a variety aggregating unit121, a packet estimation unit 122 and a controller 123 are the same asthose in FIG. 1 for the first embodiment.

FIG. 17 is a diagram showing example contents of the P2P extractiontable 212. The P2P extraction table 212 includes a plurality of entriesfor storing information that is required for estimating whether theoperation of a host related to a targeted flow, which is extracted fromtraffic information received via the network 102, is the P2P fileexchange application, and to store results obtained through anestimation. One entry includes: an entry number 212 a, which is used touniquely identify the entry; an IP address 212 b of the host; a P2Pestimation flow detection count 212 c, which indicates the number ofdetected P2P estimation flows that are employed for estimating whetherthe host operation is a P2P file exchange application; a P2P estimationresults 212 d, indicating the results of an estimation as to whether theoperation is a P2P file exchange application; and a DIP variety countaverage 212 e and a DPT variety count average 212 f, which are obtainedbased on numerical values included in the statistical values of anextracted P2P estimated flow in order to estimate whether the hostoperation is a P2P file exchange application. The definitions of the DIPvariety count average 212 e and the DPT variety count average 212 f willbe described in detail in the following description of the operation.

FIG. 18 is a diagram showing one example of the estimation thresholdvalue table 213. Values entered in the estimation threshold value 213are those employed as references to determine a flow type in a processperformed for estimating the operation of a host that is related to theflow, which is included in traffic information received via the network102. The estimation threshold value table 213 includes: a DDOSestimation threshold value 213 a, which is used as a reference fordetermining whether or not the type of the flow is a DDoS attack; anetwork scan estimation threshold value 213 b, which is used as areference for determining whether or not the type of the flow is anetwork scan; and a P2P estimation DIP variety count threshold value 213c and a P2P estimation DPT variety count threshold value 213 d, whichare used as references for determining whether the flow type is a P2Pfile exchange type.

Next, the operation of the traffic analysis apparatus 201 of thisembodiment will be described. Among the traffic information received viathe network 102, the traffic analysis apparatus 201 of this embodimentemploys, as a packet for the statistical process, only a TCP SYN packetthat represents a communication start request, and performs an estimatefor a P2P file exchange host using a method that is different from theone shown in the first embodiment. The operation of the traffic analysisapparatus 201 will now be described while referring to the flowchart inFIG. 19.

The controller 123 of the traffic analysis apparatus 201 performs theinitialization process prior to the analysis process (step 1901).Specifically, in the initialization process, entries that form thepacket count table 108, the host table 111 and the P2P extraction table212 stored in the memory 113 are set to the initial state where no dataare registered.

Then, the controller 123 receives initial setup information 1903 fromthe input/output device 103, and enters the initial setup information1903 in the extraction target table 109 and in an estimation thresholdvalue table 213 (step 1904).

At this time, in order to obtain information required to form theinitial setup information 1903, the input/output device 103 displays aninitial setup information input screen 1902 and waits for a user inputoperation.

FIG. 21 is a diagram showing an example initial setup information inputscreen 1902. The screen 1902 in the example in FIG. 21 includes fieldsfor a flow detection threshold value 1902 a, a DDOS estimation thresholdvalue 1902 b, a network scan estimation threshold value 1902 c, a P2Pestimation, a DIP variety count threshold value 1902 d and a P2Pestimation, and a DPT variety count threshold value 1902 e. When anexecution button 1902 f is clicked on, input values in the individualfields are transmitted as the initial setup information 1903 to thetraffic analysis apparatus 201, and the controller 123 writes the valuesincluded in the initial setup information 1903 in the correspondingareas of the extraction target table 109 and the estimation thresholdvalue 213. Specifically, an input value in the flow detection thresholdvalue 1902 a is entered in the threshold values 109 b of all the entriesin the extraction target table 109. An input value in the DDoSestimation threshold value 1902 b is entered in the DDOS estimationthreshold value 213 a of the threshold value table 213. An input valuein the network scan estimation threshold value 1902 c is entered in thenetwork scan estimation threshold value 213 b of the threshold valuetable 213. An input value in the P2P estimation, DIP variety countthreshold value 1902 d is entered in the P2P estimation, DIP varietycount threshold value 213 c of the threshold value table 213. And aninput value in the P2P estimation, DPT variety count threshold value1902 e is entered in the P2P estimation, DPT variety count thresholdvalue 213 d of the threshold value table 213.

When the process at step 1904 has been completed, the traffic analysisapparatus 201 enters a wait state for the reception of trafficinformation from the network 102. In this state, when the packettransmitter/receiver 105 receives traffic information 1905 via thenetwork 102, the traffic information 1905 is temporarily stored in thetraffic information buffer 107. The traffic information 1905 is, forexample, a copy of the packets that are exchanged via the network 102,or a sFlow packet formed by summarizing the portions of multiple packetsthat are sampled at appropriate intervals.

When the traffic information 1905 is stored in the traffic informationbuffer 107, the controller 123 determines whether packets included inthe traffic information 1905 are TCP SYN packets (step 1906).

When the packets are TCP SYN packets, the packet aggregating unit 120starts updating the packet count table 108 (step 1907). At step 1907,only when the packets that are included in the traffic information andthat are stored in the traffic information buffer 107 are TCP SYNpackets, the statistical process is performed using the packet counttable 108, and a flow to be focused on is extracted. Further, theestimation process for the operation of a host related to the flow isperformed, and based on the obtained results, the host table 111 and theP2P extraction table 212 are updated. This processing will be describedlater in detail.

When, as a result of the process performed at step 1907, the host table111 and the P2P extraction table 212 are updated (step 1908), and thecontroller 123 outputs the contents of these two tables to theinput/output device 103, i.e., performs a host information outputprocess (step 1909). For this process, the contents of the two tablesare formed as extracted information 1910, and the extracted information1910 is transmitted to the input/output device 103, while a hostinformation display screen 1911 is displayed.

FIG. 22 is a diagram showing an example host information display screen1911. In this example, entries for which “server” is entered in the hosttype 111 c of the host table 111 are displayed in a server list 1911 a,and entries for which “client” is entered in the host type 111 aredisplayed in a client list 1911 b. Further, an entry for which “1”,indicating that a host operation is estimated to be a P2P file exchangeapplication, is entered in the P2P estimation result 212 d of the P2Pextraction table 212, is displayed in a P2P file exchange host list 1911c. On the server list 1911 a and the client list 1911 b, the values inan IP address 111 b, a service port number 111 d and a sender count (IN)111 f or a recipient count (OUT) 111 g, included in the entries in thehost table 111, are displayed in columns “IP address”, “port number”,“client count” and “server count”. For the entry for which “1” isentered in the DDOS attack/network scan flag 111 l, a mark is displayedin column “DDOS” or “Scan”. In the P2P file exchange host list 1911 c,the value in the IP address 212 b included in the entry in the P2Pextraction table 212 is displayed in column “IP address”. Further, thehost table 111 is examined to find an entry that has, as an IP address,the value in the IP address 212 b and that employs “server” as a hosttype, and the value in the service port number 111 d of the entry thatis found is displayed in column “port number”.

The operation of the traffic analysis apparatus 201 has been described.The processing at steps 1906 to 1909 is repetitively performed each timethe traffic information 1905 is received via the network 102.

The process at step 1907 for updating the packet count table 108 willnow be described in detail. The process at step 1907 is basically thesame as the process at step 906 performed by the traffic analysisapparatus 101 of the first embodiment, and the detailed processing is asshown in the flowcharts in FIGS. 10, 11 and 12. Since the onlydifference in the second embodiment from the first embodiment is steps1210 and 1211 in the flowchart in FIG. 12, only this portion will bedescribed.

In this embodiment, during the process performed at step 1210, thepacket estimation unit 122 employs information included in an entry inthe packet count table 108, for which a packet count has exceeded athreshold value in the process at step 1008 in the flowchart in FIG. 10,and updates the information included in an entry that is found at step1208, or an entry that is newly registered at step 1209. Specifically,the P2P estimation flow detection count 212 c is incremented by one, andthe DIP variety count average 212 e and the DPT variety count average212 f are calculated.

Here, prior to explaining the definitions of the DIP variety countaverage 212 e and the DPT variety count average 212 f, the P2P fileexchange host estimation method of this embodiment will be described.

FIG. 20 is a schematic diagram showing flows for P2P file exchange. InFIG. 20, a host 2001 is currently performing a P2P file exchange, whilen hosts denoted by 2002 serve as servers relative to the host 2001, andm hosts denoted by 2003 serve as clients relative to the host 2001, allof which are currently performing P2P file exchanges. As a P2P fileexchange protocol assumed in this embodiment, a service port number usedwhen a host is operated as a server is determined at random for eachhost, and a detection threshold value sufficiently larger than n and mis designated in the threshold value 109 c of the extraction targettable 109. With this arrangement, for a P2P estimation flow that isdetected because it exceeds the threshold value and that employs thehost 2001 as a transmission source IP address, a value almost of n isapplied for the destination IP address variety count and the destinationport number variety count. This is because at step 1906 in the flowchartin FIG. 19, a packet used for the statistical process is limited to aTCP SYN packet, and in the arrangement in FIG. 20, all TCP SYN packetstransmitted by the host 2001 are forwarded only to the hosts 2002.Therefore, in this embodiment, the following method is employed. Thedestination IP address variety count and the destination port numbervariety count are sufficiently great that these values can be regardedas the number of access servers for P2P file exchanges. In addition,when the host having the transmission IP address is entered as a serverin the host table 111, it is estimated that the flow is a P2P fileexchange flow.

Specifically, in this embodiment, as a method for performing the abovedescribed estimation, an average variety count for the destination IPaddresses, included in a P2P estimation flow that is extracted, and anaverage variety count for the average destination port number arecalculated. Then, these averages are compared with estimation thresholdvalues that are designated in advance. When the averages are greaterthan the threshold values, it is estimated that the flow is a P2P fileexchange flow. The average of the destination IP address variety countand the average of the destination port number variety count are,respectively, the DIP variety count average 212 e and the DPT varietycount average 212 f; and the estimation threshold values are a P2Pestimation, DIP variety count threshold value 213 c and the P2Pestimation, DPT variety count threshold value 213 d. The comparisonprocess and the process for affecting the estimation results to the P2Pestimation results 212 d of the P2P extraction table 212 correspond tothe process at step 1211 of this embodiment.

The operation of the traffic analysis apparatus 201 for the secondembodiment of the present invention has been described.

It should be further understood by those skilled in the art thatalthough the foregoing description has been made on embodiments of theinvention, the invention is not limited thereto and various changes andmodifications may be made without departing from the spirit of theinvention and the scope of the appended claims.

1. A traffic analysis apparatus, for analyzing traffic consisting ofpackets that flow via a network, comprising: a packettransmitter/receiver that transmits and receives packets that flow viathe network, each of the packets having a plurality of items; a packetaggregating unit that counts the number of packets having predeterminedvalues in respective items of a first item group, the first item groupbeing selected from said plurality of items; a variety aggregating unitthat counts the number of different values in at least one item selectedfrom said plurality of items, but not included in the first item group;an analysis information storage unit that stores results obtained byanalyzing the packets, the results including: an item set value thatidentifies a flow type; a total value of packets aggregated by thepacket aggregating unit; and the number of different values counted bythe variety aggregating unit; and a packet estimation unit that, when atotal number of the packets counted by said packet aggregating unitexceeds a threshold value, estimates characteristics for an item set,the item set including a specific combination of packets, wherein saidpacket estimation unit estimates the characteristics for the item setbased on the number of packets, the values of said respective itemsincluded in the first item group, and the number of different valuescounted by the variety aggregating unit, wherein said respective itemsinclude a transmission source address, and wherein said packetestimation unit determines the flow type based on the item set value forwhich the total number of the packets counted by said packet aggregatingunit exceeds the threshold value, wherein said packet estimation unitdetermines that the flow type is a P2P (peer-to-peer) file exchange flowwhen the number of packets having the same value for the item of saidtransmission source address exceeds a predetermined threshold value, andwhen the ratio of the number of different values counted by said varietyaggregating unit for items of a destination IP (Internet Protocol)address, a transmission source port number, and a destination portnumber, which are not included in said first item group, matches apredetermined ratio.
 2. The traffic analysis apparatus according toclaim 1, wherein the variety aggregating unit counts the number ofdifferent values for items that do not belong to a specific item groupconsisting of an arbitrary item or multiple items in packets received bythe packet transmitter/receiver, at the first appearance of packetshaving the same value in an arbitrary item or multiple items of an itemgroup consisting of items which do not belong to the specific itemgroup.
 3. The traffic analysis apparatus according to claim 1, whereinthe packet aggregating unit uses a predetermined number of sets ofanalysis information included in the analysis information storage unit,and wherein when candidate analysis information that is included in theanalysis information storage unit to be used for a new item group iscurrently used by a different item group, analysis information used bythe different item group is abandoned, and the analysis information isused as information analysis information for the new item group.
 4. Thetraffic analysis apparatus according to claim 3, wherein the packetaggregating unit selects a plurality of candidates as the analysisinformation to be used for the new item group, and when all thecandidates are currently used by different item groups, analysisinformation is selected for which the minimum value is stored in theanalysis information storage unit.
 5. The traffic analysis apparatusaccording to claim 1, wherein the packet transmitter/receiver receivessampled packets via the network.
 6. The traffic analysis apparatusaccording to claim 1, wherein the packet aggregating unit counts, amongthe packets received by the packet transmitter/receiver, the number ofpackets having the same value in each item of a second item groupcomprising items of a destination IP address and a destination portnumber and the number of packets having the same value in each item of athird item group comprising items of a transmission source IP addressand a transmission source port number, wherein the traffic analysisapparatus further comprises a host information storage unit storingtherein the value of the destination IP address included in the seconditem group or the value of the transmission source IP address includedin the third item group when the counted number of packets having thesame value in each item included in the second item group or the countednumber of packets having the same value in each item included in thethird item group exceeds a predetermined threshold value, and whereinthe packet estimation unit estimates that the flow is a P2P fileexchange flow when the ratio of the numbers of different values countedby said variety aggregating unit for the items of the destination IPaddress, the transmission source port number, and the destination portnumber which are not included in said first item group matches apredetermined ratio and the value of the transmission source IP addressincluded in the first item group has been stored in said hostinformation storage unit.
 7. The traffic analysis apparatus according toclaim 1, wherein said packet estimation unit performs a host informationextraction process, and wherein said host information extraction processis a process for estimating the operation of a host related to the flowtype that is determined by said packet estimation unit.
 8. A trafficanalysis method, for a traffic analysis apparatus that analyzes trafficconsisting of packets that flow via a network, the traffic analysisapparatus comprising a processor, the method comprising: receiving, bythe traffic analysis apparatus, packets, each of the packets having aplurality of items, that flow via the network; counting, by the trafficanalysis apparatus, the number of packets having predetermined values inrespective items of a first item group selected from said plurality ofitems, counting, by the traffic analysis apparatus, the number ofdifferent values in at least one item selected from said plurality ofitems, but not included in said first item group; storing, by thetraffic analysis apparatus, results obtained by analyzing the packets,the results including: an item set value that identifies a flow type; atotal value of packets aggregated by the traffic analysis apparatus; andthe number of different values counted by the traffic analysisapparatus; and when a total number of the packets counted by the trafficanalysis apparatus exceeds a threshold value, producing, by the trafficanalysis apparatus, characteristics for an item set, the item setincluding a specific combination of packets, wherein the characteristicsfor the item set are estimated in accordance with the number of packetscounted in the step of counting the number of packets, and the number ofdifferent values counted in the step of counting the number of differentvalues, wherein said respective items include a transmission sourceaddress; and determining, by the traffic analysis apparatus, the flowtype based on the item set value for which the total number of thepackets counted by said traffic analysis apparatus exceeds the thresholdvalue; and determining that the flow is a P2P (peer-to-peer) fileexchange flow when the number of packets having the same value for theitem of said transmission source address exceeds a predeterminedthreshold value, and when the ratio of the number of different valuescounted by said variety aggregating unit for items of a destination IP(Internet Protocol) address, a transmission source port number, and adestination port number, which are not included in said first itemgroup, matches a predetermined ratio.
 9. The traffic analysis methodaccording to claim 8, further comprising: counting, among the receivedpackets, the number of packets having the same value in each item of asecond item group comprising items of a destination IP address and adestination port number and the number of packets having the same valuein each item of a third item group comprising items of a transmissionsource IP address and a transmission source port number; storing, in ahost information storage unit, the value of the destination IP addressincluded in the second item group or the value of the transmissionsource IP address included in the third item group when the countednumber of packets having the same value in each item included in thesecond item group or the counted number of packets having the same valuein each item included in the third item group exceeds a predeterminedthreshold value; and estimating that the flow is a P2P file exchangeflow when the ratio of the numbers of different values for items of thedestination IP address, the transmission source port number, and thedestination port number which are not included in said first item groupmatches a predetermined ratio and the value of the transmission sourceIP address included in the first item group has been stored in said hostinformation storage unit.
 10. The traffic analysis method according toclaim 8, wherein said traffic analysis apparatus performs a hostinformation extraction process, and wherein said host informationextraction process is a process for estimating the operation of a hostrelated to the flow type that is determined by said traffic analysisapparatus.